Skip to content

Add similarity service for gRPC #346

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kozistr
Copy link
Contributor

@kozistr kozistr commented Jul 16, 2024

What does this PR do?

Implement Similarity service for gRPC.

  • I just named the field name distances followed by here. maybe similarities could be more proper naming i guess.
  • only accept one pair (source_sentence & sentence) for SimilarityStreamRequest in a similar manner as RerankStreamRequest.
  • I wonder if my implementation of the SimilarityStream rpc is correct. for now, just infer source_sentence and sentence sequentially in the closure, similarity_inner. maybe there must be a more efficient approach. any feedback is welcome : )

Logs

Server

2024-07-16T13:46:57.285050Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "./mul*********-**-**rge", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "0.0.0.0", port: 8080, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-07-16T13:46:57.737392Z  INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
2024-07-16T13:46:57.741260Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 4 tokenization workers
2024-07-16T13:46:58.614438Z  INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
2024-07-16T13:47:29.929738Z  WARN text_embeddings_router: router/src/lib.rs:267: Backend does not support a batch size > 8
2024-07-16T13:47:29.929764Z  WARN text_embeddings_router: router/src/lib.rs:268: forcing `max_batch_requests=8`
2024-07-16T13:47:29.932608Z  INFO text_embeddings_router::grpc::server: router/src/grpc/server.rs:1810: Serving Prometheus metrics: 0.0.0.0:9000
2024-07-16T13:47:29.938856Z  INFO text_embeddings_router::grpc::server: router/src/grpc/server.rs:1954: Starting gRPC server: 0.0.0.0:8080
2024-07-16T13:47:29.938884Z  INFO text_embeddings_router::grpc::server: router/src/grpc/server.rs:1955: Ready
2024-07-16T13:58:42.515709Z  INFO similarity{compute_chars=75 compute_tokens=21 total_time="104.540284ms" tokenization_time="151.525µs" queue_time="294.65µs" inference_time="103.940484ms"}: text_embeddings_router::grpc::server: router/src/grpc/server.rs:1507: Success

Client

$ grpcurl -d '{"source_sentence": "What is Deep Learning", "sentences": ["What is Machine Learning", "asdf", "hello"]}' -plaintext 0.0.0.0:8080 tei.v1.Similarity/Similarity
{
  "distances": [
    0.927782,
    0.7332565,
    0.7520622
  ],
  "metadata": {
    "computeChars": 75,
    "computeTokens": 21,
    "totalTimeNs": "104551084",
    "tokenizationTimeNs": "151525",
    "queueTimeNs": "294650",
    "inferenceTimeNs": "103940484"
  }
}

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@OlivierDehaene OR @Narsil

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant